5) Schema Design - From the Absolute Basics to the Advanced Level

August 12, 2025

5) Schema Design - From the Absolute Basics to the Advanced Level

Now, let’s break Schema Design down from beginner to advanced so you’ll understand how to plan, build, and optimize database structures that are efficient, scalable, and easy to work with.

1. What is Schema Design?

A schema is the blueprint of a database. It defines:

Tables (entities)
Columns (attributes)
Data types
Relationships (foreign keys)
Constraints (rules)

Good schema design makes your queries faster, data more reliable and future changes easier.

2. Beginner Level — Core Concepts

2.1 Identify Entities

Entities are things you want to store data about.
Example (E-commerce):

Customers
Products
Orders

2.2 Determine Attributes

For each entity, list its properties.
Example (Customers table):

customer_id (Primary Key)
name
email
phone

2.3 Choose Data Types

Use the smallest data type that fits your data.
Example:

INT for IDs
VARCHAR(255) for text
DATE for dates
DECIMAL(10,2) for money

2.4 Define Primary Keys

Each table must have a Primary Key: a unique identifier.

CREATE TABLE Customers (
    customer_id INT PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(255)
);

2.5 Define Relationships

One-to-Many → One customer can place many orders.
Many-to-Many → Products appear in many orders, orders have many products.

Example:

CREATE TABLE Orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    FOREIGN KEY (customer_id) REFERENCES Customers(customer_id)
);

3. Intermediate Level - Normalization & Keys

3.1 Normalization

Breaking data into multiple related tables to reduce redundancy and improve consistency.

Example: Bad (Unnormalized)

Orders(order_id, customer_name, customer_email, product_name, product_price)

Problems:

Customer info repeated for each order.
If email changes, must update many rows.

Normalized (3NF):

Customers(customer_id, name, email)
Products(product_id, name, price)
Orders(order_id, customer_id)
Order_Items(order_id, product_id, quantity)

3.2 Foreign Keys & Referential Integrity

Foreign keys ensure related data exists.
Prevents orphaned records.

FOREIGN KEY (customer_id) REFERENCES Customers(customer_id)

3.3 Constraints

NOT NULL → field must have a value.
UNIQUE → no duplicate values allowed.
CHECK → restrict allowed values.

CHECK (price > 0)

3.4 Composite Keys

Primary key made of multiple columns.
Example:
Order_Items(order_id, product_id) - unique combination for each order.

4. Advanced Level - Performance & Scalability

4.1 Index Design in Schema

Plan indexes during schema creation, not after performance problems appear.
Example:

CREATE INDEX idx_customer_email ON Customers(email);

4.2 Denormalization for Speed

Sometimes, store redundant data for faster reads.
Example:

Store total_amount in Orders instead of calculating from Order_Items every time.

Trade-off: Faster reads but risk of inconsistency if not updated properly.

4.3 Partitioning

Split large tables into parts for faster queries.
Example:

PARTITION BY RANGE (YEAR(order_date))

4.4 Handling Many-to-Many Relationships Efficiently

Use join tables with indexes.
Example:

CREATE TABLE Order_Items (
    order_id INT,
    product_id INT,
    quantity INT,
    PRIMARY KEY (order_id, product_id)
);

4.5 Schema for Analytics vs. Transactions

OLTP (Online Transaction Processing) → normalized, fast writes.
OLAP (Online Analytical Processing) → denormalized, optimized for queries.

4.6 Versioning & Soft Deletes

Instead of deleting data, mark it as inactive:

is_deleted BOOLEAN DEFAULT FALSE

Keeps historical records for audits.

5. Example: E-commerce Schema (Balanced for Speed & Consistency)

CREATE TABLE Customers (
    customer_id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(255) UNIQUE NOT NULL
);

CREATE TABLE Products (
    product_id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    price DECIMAL(10,2) CHECK (price > 0)
);

CREATE TABLE Orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT NOT NULL,
    order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    total_amount DECIMAL(10,2),
    FOREIGN KEY (customer_id) REFERENCES Customers(customer_id)
);

CREATE TABLE Order_Items (
    order_id INT NOT NULL,
    product_id INT NOT NULL,
    quantity INT CHECK (quantity > 0),
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (order_id) REFERENCES Orders(order_id),
    FOREIGN KEY (product_id) REFERENCES Products(product_id)
);

6. Best Practices Summary Table

Schema Design Practice Quiz going from easy → hard, covering concepts, coding, and real-world optimizations.
I’ve included answers + explanations so you can check your work.

📚 Schema Design Practice Quiz

🟢 Easy Level — Fundamentals

1. Identify at least three entities for a Library Management System.
Answer:

Books
Members
Loans
(You might also include: Authors, Publishers)
Explanation: Entities are the main "things" in your domain.

2. Which data type would you use for:
a) Book title
b) Loan date
c) Member ID

Answer:
a) VARCHAR(255)
b) DATE
c) INT or BIGINT (depending on expected size)
Explanation: Choose the smallest data type that fits the data.

🟡 Medium Level — Normalization & Keys

3. This table is unnormalized:

Orders(order_id, customer_name, customer_email, product_name, product_price, quantity)

Normalize it to 3NF.

Answer:

Customers(customer_id, name, email)
Products(product_id, name, price)
Orders(order_id, customer_id, order_date)
Order_Items(order_id, product_id, quantity)

Explanation:

Remove repeating customer and product data.
Separate into related tables with foreign keys.

4. Which is better for a many-to-many relationship between Students and Courses?

a) Add multiple course_id columns to Students table
b) Create a Student_Courses table

Answer: b)
Explanation: Many-to-many relationships need a join table for flexibility and scalability.

5. Design a Order_Items table that ensures no duplicate product in the same order.

Answer:

CREATE TABLE Order_Items (
    order_id INT,
    product_id INT,
    quantity INT CHECK (quantity > 0),
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (order_id) REFERENCES Orders(order_id),
    FOREIGN KEY (product_id) REFERENCES Products(product_id)
);

Explanation: Composite primary key ensures each (order_id, product_id) is unique.

🔴 Hard Level — Optimization & Scalability

6. Your Orders table has 500M rows. You mostly query last 30 days of data. Suggest two schema-level optimizations.

Answer:

Partition the table by order_date (monthly/weekly).
Create a partial index:

CREATE INDEX idx_recent_orders ON Orders(order_date)
WHERE order_date >= CURRENT_DATE - INTERVAL '30 days';

Explanation: Reduces scanned rows, improving speed.

7. The Invoices table has this query:

SELECT * FROM Invoices WHERE YEAR(invoice_date) = 2024;

Explain why it’s slow and redesign the schema/index to improve it.

Answer:

Wrapping invoice_date in YEAR() prevents index use.
Redesign query:

SELECT * 
FROM Invoices
WHERE invoice_date >= '2024-01-01'
  AND invoice_date < '2025-01-01';

Add index:

CREATE INDEX idx_invoice_date ON Invoices(invoice_date);

Explanation: Index can now be used for range scans.

8. You have a Users table for an OLTP system and a reporting dashboard that runs slow aggregations. Suggest two schema designs to separate transactional and analytical workloads.

Answer:

OLTP schema → normalized, small indexes, fast writes.
OLAP schema → denormalized star schema with pre-aggregated data in a separate reporting table.
Explanation: Prevents heavy analytics from slowing transactions.

9. Your Products table needs to store historical price changes without losing old values. Design the schema.

Answer:

CREATE TABLE Product_Prices (
    product_id INT,
    price DECIMAL(10,2),
    start_date DATE,
    end_date DATE,
    PRIMARY KEY (product_id, start_date),
    FOREIGN KEY (product_id) REFERENCES Products(product_id)
);

Explanation: Use a temporal table pattern to track history.

10. You have a Logs table with billions of rows. You must keep 5 years of data but queries for recent month must be lightning fast. Suggest three schema changes.

Answer:

Partition by month or year.
Clustered index on (log_date, id).
Hot-cold storage: Move old data to cheaper storage but keep last month in fast storage.
Explanation: Matches storage strategy to access frequency.

✅ Key Takeaways from This Quiz:

Start with normalization to remove redundancy.
Then add indexes, partitions, and caching for performance.
Adjust schema differently for transactional vs analytical use cases.
Design with future scalability in mind.

Search This Blog

Curious Mind